Chromatin Immunoprecipitation Sequencing ◾ 221
samtools view -S -b ENCFF000XJP_chp1_filt.sam > ENCFF000XJP_chp1_
filt.bam
samtools view -S -b ENCFF000XJS_chp2_filt.sam > ENCFF000XJS_chp2_
filt.bam
samtools view -S -b ENCFF000XKD_chp3_filt.sam > ENCFF000XKD_chp3_
filt.bam
The BAM file takes less storage space. Then, we can delete the SAM file to save some storage
space if we need to. Just be careful not to delete the BAM files.
Now, we have three BAM files for the three ChIP-Seq data and one file for the control
data. Before proceeding, we need to know the number of alignments in each file and then
draw a sample of control reads approximately equal to the reads of any of the ChIP-Seq
files to be the input reads for that ChIP-Seq file. We do that to avoid library coverage bias.
The following “samtools view” commands count the alignments in each BAM file:
samtools view -c ENCFF000XGP_inp0_filt.bam
samtools view -c ENCFF000XJP_chp1_filt.bam
samtools view -c ENCFF000XJS_chp2_filt.bam
samtools view -c ENCFF000XKD_chp3_filt.bam
Table 6.1 shows the number of aligned reads in each BAM file and the factor, which is the
read count of a ChIP-Seq file divided by the read count of the control file. This fraction is
used to sample input reads from the control file for that ChIP-Seq file.
The following commands store the counts in bash variables and then use “samtools
view” command to draw a subsample of reads from the control file and store them in a
separate control file for that ChIP-Seq file. The “-b” option is to output a BAM file and “-s”
option is to draw a subsample from the file.
inpc=$(samtools view -c ENCFF000XGP_inp0_filt.bam)
chp1=$(samtools view -c ENCFF000XJP_chp1_filt.bam)
fact1=$(echo “scale=6; $chp1/$inpc” | bc)
samtools view -b -s $fact1 ENCFF000XGP_inp0_filt.bam >
ENCFF000XGP_inp0_filt_inp1.bam
TABLE 6.1 Read Count in Each BAM File, the Fraction for Sampling Reads from the Control BAM file, and
Number of Reads in the Control File for Each ChIP-Seq File
Sample
Read Count
Sampling Factor
Control Read Count
ENCFF000XGP_inp0_filt.bam
30,923,163
N/A
N/A
ENCFF000XJP_chp1_filt.bam
8,942,010
0.289168673
8,941,151
ENCFF000XJS_chp2_filt.bam
12,748,871
0.412275775
12,744,729
ENCFF000XKD_chp3_filt.bam
13,217,349
0.427425519
13,212,672